- 1
- Define a symbolic variable and (callable) symbolic function.
- 2
- Two syntax options for symbolic derivatives.
- 3
- Two syntax options for evaluating the derivative.
1 Calculus
1.1 Limits
Definition 1.1 A sequence of real numbers is a (countably infinite) list \(a_1,a_2,...\) of real numbers. Often, we abbreviate this as \((a_n)_{n\in \mathbb N}\) or just \((a_n)\).
Definition 1.2 Let \((a_n)\) be a sequence of real numbers. We say that a limit of the sequence is \(L\), and write \[\lim_{n\to\infty}a_n = L\] if the sequence gets arbitrarily close to \(L\) for sufficiently large \(n\).
We can make this mathematically precise by digging into the grammar a bit. To say that it is arbitrarily close to \(L\) means that if we pick a target error \(\epsilon\), some small number, we want the \(a_n\) to be within \(\epsilon\) of \(L\), or in other worse \(|L-a_n| < \epsilon\). This doesn’t have to happen for all \(a_n\), just those far enough out, meaning there’s a bound \(N\) such that for any index \(n>N\), we are \(\epsilon\)-close.
More concisely, we say that \(a_n\to L\) as \(n\to \infty\) if, for all \(\epsilon > 0\), there exists an \(N\) (depending on \(\epsilon\)) such that for all \(n > N\), we have \[|a_n - L| < \epsilon.\]
A limit of a sequence does not need to exist in general. When it does exist, we say that the sequence converges or is convergent.
Example 1.1 The decimal expansion for a real number is a shorthand for a certain limit. Consider \[\pi = 3.141592653589...\] We can form a sequence \[3,3.1,3.14,3.141,3.1415,3.14159,...\] whose limit is \(\pi\).
Convergent sequences are quite straightforward, even their arithmetic. The following are likely familiar to you from your calculus courses:
Proposition 1.1
- A limit of a sequence, if it exists, is unique. Therefore, we can refer to it as the (unique) limit.
- If you add two convergent sequences term by term, the limits add accordingly.
- If you multiply two convergent sequences term by term, the limits multiply accordingly.
- If you scale a convergent sequence by a constant number, the limit scales accordingly.
- If you take the reciprocal of a non-zero sequence term by term, the limit also becomes its reciprocal if it is not zero.
Proof. See Exercise 1.1.
Two kinds of sequences, monotone and Cauchy, are special in real analysis. They are uniquely simple, in a certain sense.
Definition 1.3 A sequence is said to be monotone if it is always increasing or always decreasing from term-to-term, meaning that either \[a_{i+1} \geq a_i\ \ \ \ \textrm{or}\ \ \ \ a_{i+1}\leq a_i\] for all \(i\). In the first case, we may say it is monotone increasing, and in the second that it is monotone decreasing. If the inequalities are always strict (\(>\) or \(<\) only, respectively) we say that the sequence is strictly monotone, and either strictly increasing or strictly decreasing, respectively. Constant sequences are monotone but not strictly monotone.
Definition 1.4 A sequence is said to be Cauchy if pairs of terms \(a_m\) and \(a_n\) become arbitrarily close to each other when \(m\) and \(n\) are sufficiently large. Formally, we ask that for all \(\epsilon > 0\), there is a bound \(B\) such that for all \(m,n>B\) we have \[|a_n - a_m| < \epsilon.\] As in the definition of the limit, we think of \(\epsilon\) as a very small number, our target level of closeness.
1.2 Real Numbers
The main subtlety about limits of sequences is existence. For example, why does the sequence of decimal approximations of \(\pi\) (see Example 1.1) actually converge, let alone to \(\pi\)? What about other decimals – why are they sensible notation for real numbers?
In fact, the real numbers are defined/constructed as a natural “home” for limits of rational numbers, like decimal approximations. If a limit of rational numbers could plausibly exist, then it does so in the reals. You will learn more about this in a course on real analysis. For our purposes, the following facts summarize what we need to know about the interaction between limits and real numbers:
Theorem 1.1 Let \((a_n)\) be a sequence of real numbers. Then it converges to a real number in either of the following situations:
- \((a_n)\) is monotone and there is a real number \(B\) such that \(|a_n| < B\) for all \(n\).
- \((a_n)\) is Cauchy.
The first is especially useful in light of the following fact:
Lemma 1.1 If \((a_n)\) is a sequence of real numbers, then it has a monotone subsequence; in other words, there is an increasing list \(n_1<n_2<...\) such that the sequence \((b_k)\) given by \(b_k = a_{b_k}\) is monotone.
If the original sequence converges, then so do all of its subsequences, and they all converge to the same value. This leads to a natural dévissage for sequences: first find a monotone subsequence and appeal to the monotone convergence theorem, find or characterize the limit of the subsequence, and then try to see if the original converges to it.
For completeness, we mention two more facts equivalent to Theorem 1.1 which we will occasionally use.
Definition 1.5 Let \(S\) be a subset of \(\mathbb R\). A supremum (resp. infinimum) for \(S\) is a real number \(R\) such that \(R<s\) (resp. \(R>s\)) for all \(s\) in \(S\).
The least upper bound (resp. greatest lower bound) of \(S\) is a supremum (resp. infinimum) which is at least as large as (resp. at least as small as) any other supremum (resp. infinimum) of \(S\).
The least upper bound is denoted \(\sup S\), and the greatest lower bound \(\inf S\).
Theorem 1.2 The following are equivalent facts about the real numbers:
- Every monotone sequence converges.
- Every cauchy sequence converges.
- Every set \(S\) with a supremum has a least upper bound.
- Every set \(S\) with an infinimum has a greateset lower bound.
- Every sequence of nested intervals has a non-empty intersection.
Some of these might sound or feel trivial, but note that they’re all false for the rational numbers. In fact, they even depend on the notion of size: there are other functions which look like the absolute value but which have very different behavior from the persective of sequences and limits.
1.3 Continuity and IVT
Continuity is an important property of functions, which essentially says that they transform limits as nicely as one could hope (cf Proposition 1.1). First, we need to define the notion of limits for functions.
Definition 1.6 Let \(f:\mathbb{R}\to \mathbb{R}\) be a function. Similar to sequence limits, we write \[ \lim_{x\to a}f(x) = b, \] and say that \(b\) is a limit of \(f\) at \(x=a\) (or as \(x\) goes to \(a\)) if \(f(x)\) becomes arbitrarily closed to \(b\) on a sufficiently small neighborhoods around \(a\). One can take \(a=\infty\) or \(a=-\infty\), in which case we mean that \(f(x)\) gets arbitrarily close to \(b\) as \(x\) grows very large/small.
Another version of the limit notation is writing “\(f(x)\to b\) as \(x\to a\)” or even “\(f(x) \underset{x\to a}{\longrightarrow} b\)”.
The limit of a function captures the behavior of \(f\) near \(a\). Similarly, \(\lim_{x\to \infty}f(x)\) captures the behavior of \(f\) when \(x\) is large (“near infinity”), which is also described as the asymptotic behavior of \(f\).
The following real analysis fact relates function and sequence limits:
Proposition 1.2 Let \(f\) be a function. Then \(\lim_{x\to a}f(x) = b\) if and only if for any sequence \(x_n\) approaching \(a\), the sequence \(f(a_n)\) approaches \(b\).
Continuous functions are precisely those which transform limits of sequences “correctly”.
Definition 1.7 A real function \(f\) is said to be continuous at \(x=a\) if its limit at \(a\) agrees with its value. i.e. if \[ f(a) = \lim_{x\to a}f(x). \] We say that \(f\) is continuous if it is continuous everywhere on its domain; more generally we say that \(f\) is continuous on a subset \(A\) of its domain if \(f\) is continuous at all \(a\) in \(A\). No function is continuous on a set not contained in its domain.
Proposition 1.3 Let \(f\) be a function.
- A limit of a function is unique if it exists.
- The sum or product of two continuous functions is continuous.
- The reciprocal of a non-zero continuous function is continuous.
- If \(f\) is continuous at \(x=a\) and \(f(a)=b\), and \(g\) is continuous at \(x=b\), then the composition \(g\circ f\) is continuous at \(x=a\).
1.4 Order of Convergence
Some limits converge very “fast”, like \[\frac 1 {2^{2^n}} \to 0\ \ \textrm{as}\ \ n\to\infty,\] whereas others converge relatively slowly, like \[\sum_{k=1}^n \frac{(-1)^k}{k} \to \ln 2\ \ \textrm{as}\ \ n\to\infty.\]
Casually speaking, fast convergence means that, with respect to the definition of a limit, even a small \(\epsilon\) doesn’t require \(N\) to be too large, whereas slow means that a small \(\epsilon\) requires an enormous \(N\).
We would like to make this precise. With numerical analysis in mind, we will often have methods that iteratively estimate solutions to certain problems, and we’d like to say something about how much accuracy is gained at each step. If convergence is slow, we might want to look for other methods, and if it’s fast, then we wouldn’t want to throw an excess of computing power (and money!) at it.
It turns out that much of calculus can be interpreted in terms of orders of convergence too; you could say that calculus is the study of the arithmetic of linearly-good approximations.
Definition 1.8 Let \(f(x)\) be a real function and \(a\) a point. We say that \(f(x)\) converges to \(0\) faster than linearly as \(x\) goes to \(a\) if we can write \[f(x) = c(x) (x-a)\] for some function \(c(x)\) such that \[\lim_{x\to a} c(x) = 0.\] Equivalently, \[\lim_{x\to a} \frac{f(x)}{x-a} = 0,\] in which case we can define \(c(x)\) to be \(f(x)/(x-a)\) when \(x\neq a\) and \(c(a) = 0\). We might also say that \(f\) converges to zero superlinearly as \(x\) goes to \(a\), or that \(f\) is much smaller than linear at \(x=a\).
The second variation suggests the reason for saying that \(f\) is much smaller than linear. The denominator goes to zero, so the limit will only exist if the numerator also goes to zero (meaning \(f\) is small at \(x=a\)). But not only does it exist, it goes to zeo, rather than balancing at some finite number. That means \(f(x)\) must be even smaller than \(x-a\) for \(x\) near \(a\).
Superlinear convergence naturally suggests “superquadratic” convergence, where \((x-a)^2\) appears instead of \(x-a\). More generally, we can define convergence of arbitrary order.
Definition 1.9 Let \(f(x)\) be a real function and \(a\) a point. We say that \(f(x)\) converges to \(0\) with order \(n\) if \[f(x) = c(x) (x-a)^n\] for some function \(c(x)\) such that \[\lim_{x\to a} c(x) = 0.\] This is again equivalent to \[\lim_{x\to a} \frac{f(x)}{(x-a)^n} = 0.\]
We may also write this with “little-o” notation: \[f(x) = \mathcal{o}(g(x))\hspace{2em}(x\to a)\]
This is a good way to compare functions: how can we make statements like “\(f(x)\) is really close to \(g(x)\) near \(x=a\)” more mathematically precise? This should be the same as saying that \(E_{f,g}(x) = f(x) - g(x)\) is quite small near \(x=a\). Order is one way of making this precise.
Definition 1.10 Let \(f\) and \(g\) be real functions. We say that they are equal up to order \(n\) (or equal modulo order \(n\)) at \(a\) if \(f(x) - g(x)\) converges to zero with order \(n\). Or, in other words, if \[E_{f, g}(x) = f(x) - g(x) = \mathcal{o_{x\to a}}((x-a)^n)\]
1.5 (Taylor) Polynomial Approximation
Now, we can apply the notion of “equal up to order \(n\)” to the problem of polynomial approximation. More precisely, given a function \(f\) and \(a\in\mathbb{R}\), we want to find, if possible, a polynomial \(P\) of degree at most \(n\) such that
- \(P(a) = f(a)\)
- \(P\) is equal to \(f\) up to order \(n\) at \(a\).
Definition 1.11 If such a function exists, we call it the \(n\)th (Taylor) polynomial approximation to \(f(x)\) at \(x=a\).
Notice that if \(P(x)\) satisfies just the first condition, not the degree stipulation, then so too does \(P(x) + (x-a)^{n+1}Q(x)\) for any polynomial \(Q(x)\) because
\[\begin{align*} \lim_{x\to a}\frac{f(x) - [P(x) + (x-a)^{n+1}Q(x)]}{(x-a)^n} &= \lim_{x\to a}\frac{f(x) - P(x)}{(x-a)^n} - \lim_{x\to a}\frac{(x-a)^{n+1}}{(x-a)^n}Q(x),\\ &= \lim_{x\to a}\frac{f(x) - P(x)}{(x-a)^n} - \lim_{x\to a}(x-a)Q(x),\\ &= 0 + 0 = 0. \end{align*}\]
Therefore, if we drop the degree requirement on \(P\), terms with degree \(\geq n+1\) can literally be anything. This is where the degree requirement comes from – it’ll guarantee uniqueness, so that it makes sense to refer to the \(n\)th Taylor polynomial approximation.
Theorem 1.3 The \(n\)-th order Taylor polynomial of a function \(f\) at \(a\) is unique (if exists).
Proof. We only need to show that if \(P\) and \(Q\) are two \(n\)-th order Taylor polynomials of the same function \(f\) at the same point \(a\), they must be equal. Now assume that \(P\) and \(Q\) are both equal \(f\) up to order \(n\) and thus they equal to each other up to order \(n\).
By assumption, we know two things:
\[ \lim_{x\to a} \frac{f(x) - P(x)}{(x-a)^n} = 0 \]
\[ \lim_{x\to a} \frac{f(x) - Q(x)}{(x-a)^n} = 0 \]
Subtracting the two and applying limit rules leads us to
\[ \lim_{x\to a} \frac{P(x)-Q(x)}{(x-a)^n} = 0 \]
The numerator is a difference of polynomials of degree at most \(n\), which we can write out in coefficients around \(x=a\), \[ P(x) - Q(x) = c_0 + c_1(x-a) + \dots + c_n(x-a)^{n}. \]
Now, we have \[ 0 = \lim_{x\to a} \frac{P(x) - Q(x)}{(x-a)^n} = \lim_{x\to a} \frac{c_0 + c_1(x-a) + \dots + c_n(x-a)^{n}}{(x-a)^n}. \]
The denominator goes to zero, so the limit can only exist if the numerator goes to zero. But the numerator is a polynomial, hence continuous, so that means the numerator has a root at \(x=a\), and therefore \(c_0 = 0\). Canceling a common factor doesn’t change the limit, and thus
\[ 0 = \lim_{x\to a} \frac{c_1 + \dots + c_n(x-a)^{n-1}}{(x-a)^{n-1}}. \]
Repeating the previous reasoning, we get \(c_1=0\), then \(c_2=0\) and so on. We won’t run out of \((x-a)\)’s in the denominator because the degree of the numerator is at most \(n\). Therefore, all the \(c_i\) are zero, so \(P(x) - Q(x) = 0\), as was to be shown (alternatively, one could prove by more analytic means that the first nonzero term \(c_k/(x-a)^{n-k}\) dominates the others).
1.6 Derivative
The derivative turns out carry the same information as the \(1\)st Taylor polynomial approximation, the linear one. Any Taylor polynomial for \(f(x)\) at \(x=a\) must pass through the point \((a,f(a))\), so the only information necessary to specify a line is its slope, because every such line takes the form \[L_m(x) = m(x-a) + f(a).\] That slope, \(m\), is the derivative. Of course, the line has to actually be a good linear approximation in the sense of the previous section. This is why derivatives do not always exist.
Linear approximations are desirable because lines are simple, and they have a lot of nice properties. Even though we expect higher order approximations to be more accurate (not always!) they could be harder to calculate. Plus, it turns out that higher order approximations can be calculated from a sequence of linear approximations.
Definition 1.12 The derivative of \(f\) at \(a\) is the slope of the best linear approximation (BLA) to \(f(x)\) at \(x=a\). This is the number \(m\) (a slope) such that the line \[L_m(x) = m(x-a) + f(a)\] is the first-order taylor approximation to \(f(x)\) at \(x=a\). In other words, \(m\) must satisfy
\[\lim_{x\rightarrow a} \frac{f(x) - f(a) - m(x-a)}{x-a} = 0.\]
This number need not exist in general. When it does, we say that \(f\) is differentiable at \(a\). Typical notation for the derivative (the number \(m\)) include: \(f'(a)\), \(\frac{df}{dx}(a)\), or \(\left.\frac{df}{dx}\right|_{x=a}\). The first two naturally arise from view the derivative of \(f\) as a function written \(f'\) or \(\frac{df}{dx}\) which assigns to each \(a\) the corresponding slope.
Then differentiation itself is an operator a function from functions to functions – in fact, it is even a linear operator.
Referring to the definition of order, it is easy to see that the derivative can be expressed as a limit:
\[f'(a) = \lim_{x\rightarrow a} \frac{f(x) - f(a)}{x-a}.\]
The quotient on the right hand side is naturally interpreted as the slope of the secant line through \((a,f(a))\) and \((x,f(x))\), the familiar limit definition from calculus. However, to see that this new definition really is equivalent to the usual one, we also need to show that when the usual derivative exists, it meets our criterion (homework).
There is a further way to write down the definition of the derivative. Consider the formula \[ f(x) - L_m(x) = c(x)(x-a). \]
We can rewrite this as \[ f(x) - f(a) = (m+c(x))(x-a) \] Therefore, \(f'(a)\) can also be defined as a value of \(m\) that make this true for some \(c(x)\) satisfying \(\displaystyle \lim_{x\to a}c(x) = 0\). This form is convenient when we prove the composition rule later.
As we have defined it, the derivative of \(f\) can be viewed as the slope of the (asymptotically) best linear approximation to \(f\) at \(x=a\). The notion of order tells us that the error in the estimate, given by \[f(x) - [f(a) + f'(a)(x-a)],\] is smaller (in a precise sense!) than any linear function. This is part of why the derivative is so helpful, it helps you estimate functions accurately.
Example 1.2 Suppose you’re told just a little information about a mystery function \(f(x)\): you know \(f(0) = 1\), and \(f'(x) = f(x)\). What is \(f(1)\)? At a first pass, we know \[f(x) \approx f(0) + f'(0)(x-0)\] for \(x\approx 0\). We know that \(f'(0) = f(0) = 0\), so we get \[f(x) \approx 1+x\] and so \(f(1) \approx 2\). Since it’s asymptotically best, we’d get a better result for values of \(x\) closer to zero. Take \(x=1/2\), and we can estimate \[f(0.5) \approx 1.5.\] But then we can start over with a fresh BLA using that information, \(f'(0.5) = f(0.5) \approx 1.5\) and hence \[f(x) \approx 1.5 + 1.5(x-0.5)\] for \(x\) near \(0.5\). If we then set \(x=1\), we arrive at \[f(1) \approx 1.5 + 1.5(0.5) = 2.25.\]
If one repeats this with smaller and smaller steps, we expect ever more accurate estimates for \(f(1)\). The only operations necessary for the estimate, because it’s the best linear approximation, are normal addition and multiplication.
1.7 Derivative Rules
You are likely familiar with many derivative rules from your calculus classes. We will prove two of them here, and leave several others for your homework. A key idea is that the derivative lets us replace functions with lines, and (almost) all of the derivative rules correspond to operations on lines: what is the slope when you add two lines, for example?
The first rule we shall prove here is the product rule. In a real analysis course, one usually proves this by brute force. Here, we use a different argument that makes the above intuition rigorous while emphasizing the idea of approximation.
To arrive at the product rule, consider what happens when you multiply two lines: \[(a+bx)(c+dx) = ac + (ad +bc)x + bdx^2 \approx ac + (ad+bc)x\] The top degree term is small, so we dropped it, and the remaining “slope” of \(ad+bc\) has cross-terms involving the constant term and slope of each line. Slopes are derivatives, so we expect that \((fg)'\) will be of a similar form.
To prove this, we need a lemma to justify dropping small terms, essentially proving that order of convergence and multiplication are compatible. Then we prove the product rule by following the line calculation.
Lemma 1.2 Suppose \(f\) and \(\tilde f\) are equal up to order \(n\) at \(a\) and so are \(g\) and \(\tilde g\). Assume that all four of them admit limits at \(a\). Then, \(fg\) and \(\tilde f\tilde g\) are equal up to order \(n\) at \(a\)
Proof. \[\begin{align*} (fg)(x) - (\tilde f\tilde g)(x) = f(x)g(x) - \tilde f(x)\tilde g(x) &= f(x)\big(g(x) - \tilde g(x)\big) + \tilde g(x)\big(f(x) - \tilde f(x)\big)\\ &= f(x)c_g(x)(x-a)^n + \tilde g(x)c_f(x)(x-a)^n\\ &= \big(f(x)c_g(x)+ \tilde g(x)c_f(x)\big)(x-a)^n \end{align*}\]
Now notice that \[ \lim_{x\to a } \big(f(x)c_g(x)+ \tilde g(x)c_f(x)\big) = 0 \] Then, by definition \(fg\) and \(\tilde f\tilde g\) are equal up to order \(n\) at \(a\).
Theorem 1.4 Suppose \(f\) and \(g\) are differentiable at \(a\). Then, \[(fg)'(a) = f'(a)g(a) + f(a)g'(a).\]
Proof. By definition, \(f(a) + f'(a)(x-a)\) and \(g(x) + g'(a)(x-a)\) are the unique linear approximations of \(f\) and \(g\) at \(a\), respectively. Now, by the preceding theorem, we see that \[\big(f(a) + f'(a)(x-a)\big)\big(g(x) + g'(a)(x-a)\big) = f(a)g(a) + \big(f(a)g'(a) + f'(a)g(a)\big)(x-a) + f'(a)g'(a)(x-a)^2\] is equal to \((fg)(x)\) up to order 1. For such polynomials, we saw already that the higher degree term can be neglected (cf the discussion of uniqueness of Taylor approximations) and that the less-than-linear part is unique, so the unique linear approximation of \(fg\) at \(a\) is given by \[ f(a)g(a) + \big(f(a)g'(a) + f'(a)g(a)\big)(x-a)\] Therefore, we have \[(fg)'(a) = f'(a)g(x) + f(a)g'(a).\]
What if you add two lines? You’ll get \[(ax+b) + (cx +d) = (a+c)x + (b+d).\] Adding lines adds slopes, which suggests that adding functions adds derivatives (see Exercise 1.3)
What happens when you compose two lines? Well, \[(a+bx)\circ (c+dx) = a + b(c+dx) = a + bc + bdx.\] The slopes end up multiplying. So we should expect \((f\circ g)'\) to be a product of \(f'\) and \(g'\). The only small twist is making sure we multiply the correct two slopes. These are easy to guess: \((f\circ g)(a)\) requires finding \(g(a)\), suggesting \(g'(a)\), and then calculating \(f(g(a))\), which suggests \(f'(g(a))\). Reasoning with lines, therefore, leads us to think \[(f\circ g)'(a) = f'(g(a))g'(a),\] which is indeed the correct composition rule. As above, we can basically follow the linear reasoning.
Theorem 1.5 Let \(f\) and \(g\) be two real functions. Suppose \(g\) is differentiable at \(x=a\) and \(f\) is differentiable at \(x=g(a)\). Then \(f\circ g\) is differentiable at \(x=a,\) and \[(f\circ g)'(a)= f'(g(a))g'(a)\]
Proof. For variety, let’s use a different definition to prove this. The benefit of having different equivalent definitions is that you can switch freely between them!
By assumption, we have
\[\begin{align*} g(x) - g(a) &= \big(g'(a) + c_g(x)\big)(x-a)\\ f(y) - f(g(a)) & = \big(f'(a)+ c_f(y)\big)(y-g(a)) \end{align*}\]
Then, \[\begin{align*} f(g(x)) - f(g(a)) &= \big(f'(a) +c_f(g(x))\big)(g(x)-g(a))\\ &= \big(f'(g(a)) + c_f(g(x))\big)\big(g'(a) + c_g(x)\big)(x-a)\\ &= \bigg[f'(g(a))g'(a) + g'(a)c_f(g(x)) + f'(g(a)) c_g(x) + c_f(g(x))c_g(x) \bigg](x-a)\\ &= f'(g(a))g'(a)(x-a) + \bigg[g'(a)c_f(g(x)) + f'(g(a)) c_g(x) + c_f(g(x))c_g(x) \bigg](x-a)\\ \end{align*}\]
After dividing by \(x-a\) and canceling, we still know that the last three terms (those in the brackets) go to \(0\) when \(x\to a\): differentiability implies continuity, so \(g(x)\to g(a)\) as \(x\to a\)), so \(c_f(g(x))\to c_f(g(a))=0\), and the \(c_g\) go to zero by definition. Therefore, \((f\circ g)'(a)= f'(g(a))g'(a)\).
1.8 Derivatives in Sage
Sage makes it very easy to calculate symbolic derivatives of standard functions:
1.9 The Mean Value Theorem
The Mean Value Theorem (MVT) is one of the most powerful results in real analysis. There are several variations, not all equivalent, including: the fundamental theorem of the derivative, the inequality form/racetrack principle, the secant form, and the Cauchy MVT.
Among other applications, it makes precise the following intuition: if you know where a function starts, and you know how fast it grows, then you know the function (the uniqueness theorem for simple ODEs!). This fact turns out to be deceptively subtle.
The following is sometimes attributed to Fermat.
Lemma 1.3 Let \(f\) be a real function which attains a local extremum at \(c\). If \(f\) is differentiable at \(c\), then \(f'(c) = 0\).
Proof. Let’s suppose the extremum is a local max; the argument for local mins is identical (or use \(-f\) and appeal to the derivative rules).
Recall that
\[f'(c) = \lim_{x\to c}\frac{f(x) - f(c)}{x-c}\]
Now, since \(f(c)\) is a local max, there is a neighborhood \((c - \delta, c + \delta)\) around \(c\) such that for all \(x\) in this neighborhood, \(f(x) \leq f(c)\).
Thus, for any \(x\) such that \(x<c\), we have \(\displaystyle \frac{f(x) - f(c)}{x-c}\geq 0\), and for any \(x>c\) we have \(\displaystyle \frac{f(x) - f(c)}{x-c}\leq 0\). Therefore, when \(x\) approaches \(c\) from the left, \[\lim_{x\to c^-}\frac{f(x) - f(c)}{x-c}\geq 0,\] and when \(x\) approaches from the right, \[\lim_{x\to c^+}\frac{f(x) - f(c)}{x-c}\leq 0.\] For the limit to exists, the left and right limit must coincide. This only leaves one possibility, zero, so \[f'(c) = \lim_{x\to c}\frac{f(x) - f(c)}{x-c} = 0.\]
Theorem 1.6 Let \(f\) be continuous on \([a, b]\) and differentiable on \((a, b)\). Suppose \(f(a) = f(b)\). Then, there is some \(c\in (a, b)\) where \(f'(c) = 0\).
Proof. Since \(f\) is continuous on \([a, b]\), it attains maximum and minimum values. Suppose the maximum or the minimum is attained at some point \(c\) in the open interval \((a, b)\), where \(f\) is differentiable, we can appeal to the preceding theorem to show that \(f'(c) = 0\).
Otherwise, neither maximum nor minimum are not attained in the open interval, which means that they are both attained at the end points. However, \(f(a) = f(b)\), so this value is both the minimum and the maximum of \(f\) on \([a, b]\), which implies that \(f\) is a constant on \([a,b]\). In this case, any point \(c\in(a, b)\) satisfies \(f'(c) = 0\).
We don’t actually need to assume the continuity at endpoints. We only need to assume that the limits of \(f\) at \(a\) and \(b\) both exist and are equal. As far as I know, this assumption lingers for historical reasons. If the limits exist and are continuous, we can always extend them by defining the values at endpoints to be the corresponding limits. Notice that this operation doesn’t change the derivatives of \(f\) and \(g\) on \((a, b)\) (derivatives only depend on the behavior of a function in arbitrarily small neighborhoods, and any \(x\in(a, b)\), you can find a small enough neighborhood of \(x\) to separate it from the endpoints).
With this, we can prove Cauchy’s MVT. The MVT you may know from calculus is the special case \(g(x)=x\), which arises naturally from the secant interpretation of the derivative. In essence, Cauchy’s MVT upgrades us from derivatives with respect to \(x\) to derivative with respect to other functions. You’ve likely seen things like \[\frac{df}{d\ln},\ \ \ \ \frac{df}{de^x},\ \ \ \ \frac{df}{dg}\] in your other courses, and this theorem tells us that they behave as nicely as their notation suggests. In particular, l’Hôpital’s rule will follow quickly from this theorem.
Theorem 1.7 Let \(f\) and \(g\) be continuous on \([a, b]\) and differentiable on \((a, b)\). Then, there is some point \(c\in (a, b)\) where \[ f'(c)(g(b) - g(a)) = g'(c)(f(b) - f(a)) \]
First, some motivation. Recall from linear algebra that two vectors \((x_1, y_1)\) and \((x_2, y_2)\) are parallel if and only if \(x_1y_2 = x_2y_1\). Therefore, if we let \[\vec{h}(x) = \big(f(x), g(x)\big),\] what we need to prove is that there is some \(c\in(a, b)\) where the vector \[\vec{h}'(x) = \big(f'(x), g'(x)\big)\] is parallel to \[\vec{h}(b) - \vec{h}(a) = \big(g(b) - g(a), f(b) - f(a)\big).\] aside. The code block in the companion notebook implements a simple demo. Geometrically, we want to find a value of \(c\) such that the red and the green line are parallel. If you draw a graph for Rolle’s theorem, you would see the scenarios are exactly the same, except that this time our diagram is tilted. Therefore, to reduce this case to Rolle’s theorem, we want to define a function \(t(x)\) that reflects the “vertical position” of \(\vec{h}(x)\) in the direction perpendicular to red line, so that \(t(a) = t(b)\) and we can apply Rolle’s theorem on \(t(x)\). One way to do this is by taking the dot product of \(\vec{h}(x)\) with some fixed vector that is perpendicular to \(\big(\vec{h}(b) - \vec{h}(a)\big)\) (the red line). A convenient choice is to exchange \(x\) and \(y\) coordinate and then slip the sign of the new \(x\) coordinate. This gives \(\big(g(a) - g(b), f(b) - f(a)\big)\).)
Proof. Regardless of whether you like the above geometric motivation, you can verify that if we define \[ t(x) = f(x)(g(a) - g(b)) + g(x)(f(b) - f(a)), \] then we have \(t(a) = f(b)g(a) - g(a)f(b) = t(b)\) and that \(t(x)\) is differentiable on \((a, b)\) while continuous on \([a, b]\). Now, by Rolle’s theorem, there is some point \(c\in (a, b)\) where \[ 0 = t'(c) = f'(c)(g(a) - g(b)) + g'(c)(f(b) - f(a)) \] Therefore, at this point \(c\) we have \(f'(c)(g(b) - g(a)) = g'(c)(f(b) - f(a))\).
Similar to the case of Rolle’s theorem, this theorem is still true if we don’t assume the continuity at the end points but only the existence of limits. Of course, in this case we need to replace \(f(a)\), \(f(b)\), \(g(a)\), \(g(b\) by the corresponding limits. We may call this slightly stronger result as the extended Cauchy’s MVT.
1.10 l’Hôpital’s Rule
Interestingly, l’Hôpital did not prove his rule. A mathematician in his employ (well, under his patronage) proved it, so his name was attached. It’s a handy way of calculating limits.
Note that the special case \(g(x) = x-a\) furnishes the definition of the derivative – what is remarkable is that one can “cancel” the \(x-a\) when “dividing” one definition of the derivative by another.
Theorem 1.8 Suppose that \(f\to 0\) and \(g\to 0\) as \(x\to a\), and that \[\lim_{x\to a} \frac{f'(x)}{g'(x)}\] exists. Then the limit \[\lim_{x\to a} \frac{f(x)}{g(x)}\] also exists, and \[ \lim_{x\to a} \frac{f(x)}{g(x)} =\lim_{x\to a} \frac{f'(x)}{g'(x)}. \]
Proof. Recall the useful real analysis fact that \[\lim_{x\to a}f(x) = b\] if and only if for any sequence \((x_i)\) approaching \(a\) but not containing \(a\), the sequence \((f(a_i))\) also approaches \(b\). So take such a sequence and let \(\displaystyle L = \lim_{x\to a} \frac{f'(x)}{g'(x)}\). Then, by this fact, we only need to show that \[ \lim \frac{f(x_i)}{g(x_i)} = L. \]
For each \(x_i\), apply the (extended) Cauchy MVT on the interval \((a, x_i)\) (or \((x_i, a)\) if \(x_i > a\)) with \(f\) and \(g\). We obtain a sequence \(c_i \in (a, x_i)\) where \[f'(c_i)(g(x_i) - 0) = g'(c_i)(f(x_i) - 0)\] or equivalently \[ \frac{f'(c_i)}{g'(c_i)} = \frac{f(x_i)}{g(x_i)} \]
Now, \(x_i\to a\) and \(c_i\) is in between \(x_i\) and \(a\), so \(c_i\) is also a sequence approaching \(a\) but not containing \(a\). Therefore, by the other direction of the useful fact, we have \[\lim \frac{f(x_i)}{g(x_i)} = \lim \frac{f'(c_i)}{g'(c_i)} = L.\]
1.11 Higher Order Derivatives and Taylor Polynomials
Definition 1.13 We define the second derivative of \(f\) at \(a\) to be the derivative of its derivative function at \(a\). Third and higher derivatives are similarly defined. We write \[f^{(n)}(a)\] to denote the \(n\)th order derivative of \(f\) at \(a\). If it exists, we say that \(f\) is \(n\)-times differentiable (on some domain, at some point). You can define it inductively as \[f^{(n)}(a) = (f^{(n-1)})'(a)\]
Some people would naturally suggest defining higher derivatives as coefficients of higher degree approximations. The derivative is the slope of the first order approximation, so one might want to define the second derivative to be the leading coefficient of the second order approximation. In other words, if \[a + b(x-a) + c(x-a)^2\] is an order two approximation to \(f\) at \(x=a\), it seems reasonable to define the second derivative to be \(c\) – it’s not hard to see that \(b=f'(a)\), so this would give us \[f(a) + f'(a)(x-a) + f''(a)(x-a)^2\] as a second order approximation, and so on.
For historical reasons, this is not our definition, and would give you the wrong numbers; in fact, the leading coefficient is \(f''(a)/2\) by the standard definitions. The only advantage of the standard form is that you just iterate through without having to think. Interestingly, this is the right definition for parts of number theory.
Theorem 1.9 Let \(n\geq1\) be an integer, and let \(f\) and \(g\) be two functions whose derivatives exist at \(a\) up to order \(n\). Then, \(f\) and \(g\) are equal up to order \(n\) at \(a\) if \(f(a) = g(a)\) and the derivatives up to order \(n\) all agree.
Proof. If \(n = 1\), then since \(f(a) = g(a)\) we have
\[ \lim_{x\to a}\frac{f(x) - g(x)}{x-a}= \lim_{x\to a}\bigg(\frac{f(x) - f(a)}{x-a} - \frac{g(x) - g(a)}{x-a}\bigg) = f'(a) - g'(a) = 0 \]
If \(n\geq 2\), then \(f'\) and \(g'\) are continuous at \(a\), so we can apply l’Hôpital’s Rule to obtain \[ \lim_{x\to a}\frac{f(x) - g(x)}{(x-a)^{n}} = \lim_{x\to a}\frac{f'(x) - g'(x)}{n(x-a)^{n-1}} \]
We need the continuity of \(f'\) and \(g'\) to show that the limit on the right-hand-side exists, which has to be established before applying l’Hôpital’s Rule. Since \(n\) in the denominator is just a constant, we see that when \(n\geq 2\), \(f\) and \(g\) being equal up order \(n\) reduces to \(f'\) and \(g'\) being equal up to order \(n-1\). After repeating this reduction \(n-1\) times, the proof is reduced to showing that \(f^{(n-1)}\) and \(g^{(n-1)}\) are equal up to order \(1\). This is the first case: we assumed that \(f^n(a)\) and \(g^n(a)\) are equal.
Corollary 1.1 Continue from the assumptions of Theorem 1.9, and define
\[ P_{f, n, a}(x) = \sum_{k = 0}^{n}\frac{f^{(k)}(a)}{k!}(x - a)^k \]
Then \(P_{f, n, a}(x)\) is the unique \(n\)-th order Taylor approximation of \(f\) at \(a\).
Proof. Just check that the derivatives match up.
Exercises
Exercise 1.1 Using the definition, verify Proposition 1.1.
Exercise 1.2 Try using a degree two taylor polynomial, the best quadratic approximation, to estimate values of the function above. What do you notice?
Exercise 1.3 Assume \(f\) and \(g\) are differentiable at \(x=a\). Prove the sum rule, \[(f+g)'(a) = f'(a) + g'(a),\] from the BLA/Taylor definition. Remember, adding lines adds slopes.
Exercise 1.4 Suppose that we know \(f(x) = 2-(x-3) + E(x)\) near \(x=3\), where \(2+(x-3)\) is the BLA we know that that \(|E(x)|<1\) on the interval \((2,4)\).
- Let \(g(x)=x^3 - 2x^2 + 3\). Determine the BLA for \(g\circ f\) at \(x=3\).
- Can you determine a bound for the error term associated to the BLA of \(g\circ f\) on \((2, 4)\)? If so, provide it; if not, what information would you need to be able to do so?
- Suppose we have a function \(h(x)\) differentiable at \(x = 0\), and such that \(h(0) = 3\). Can you bound the error for the BLA of \(f\circ h\) near zero? If so, provide a bound; if not, what information would you need to be able to do so?
Exercise 1.5 Let \(f(x)\) be a polynomial of degree at least \(1\) and \(a\) a real number. Prove that \[g(x) = \frac{f(x) - f(a)}{x-a}\] is also a polynomial.
Exercise 1.6 Let \(f(x)\) be a bounded function, meaning that there is a number \(B\) such that \(|f(x)| < B\) for all \(x\). Prove that \[x f(x) \to 0\ \ \ \ \textrm{as}\ \ \ \ x\to 0.\]
Exercise 1.7 Define the sequence \[a_n = \sum_{k=0}^n 2^{-k}.\] Prove that \(a_n\to 2\) as \(n\to \infty\) from the definition. You can assume that \(\lim_{n\to\infty} 2^{-n} = 0\).
Exercise 1.8 Show that the limit definition of the derivative is equivalent to the definition in terms of BLAs (first order taylor approximations).
Exercise 1.9 Suppose \(f\) is differentiable at \(x=a\). Show that \(f\) is continuous at \(x=a\).
Exercise 1.10 In this problem, we will use the BLA definition of the derivative, prove that if \(f,g,1/g\) are differentiable at \(x=a\) and \(g(a)\neq 0\), then \(f/g\) is differentiable at \(x=a\) and its derivative is \[\left(\frac f g\right)'(a) = \frac{f'(a)g(a) - f(a)g'(a)}{g(a)^2}.\]
- Calculate the derivative of \(\dfrac 1 g\) by expressing the identity \[g \frac 1 g = 1\] in terms of BLAs.
(a’) Alternatively, calculate the derivative of \(f(x) = 1/x\). Then apply the composition rule to determine the derivative of \(\dfrac 1 g\).
- Combine the result of (a) or (a’) with the product rule.
Exercise 1.11 Let \(n>m\). Show that if \(f\) converges to zero at \(x=a\) with order \(n\), then it also converges to zero with order \(m\) too.
Exercise 1.12 Let \(P(t)\) be a function defined almost everywhere which is known to have the following properties: \[P(1) = 4\ \ \ \ P'(t) = tP(t)^2.\]
- Using a best quadratic approximation at \(t=1\), estimate \(P(0)\) and \(P(-1)\).
- Determine \(P''(t)\) (in terms of \(P\). Assuming the function is defined at \(t=0\), do you think the critical point at \(t=0\) is a minimum, a maximum, or neither? Explain. Hint: squares are always positive.
- Looking at your answers to the previous parts, do you think your approximations are accurate? Do you think you have over- or under-estimated the true values? Briefly explain your reasoning (you do not need to provide a proof).